_images/icon-pdf.svg _images/icon-svg.svg _images/icon-xps.svg _images/icon-cbz.svg _images/icon-mobi.svg _images/icon-epub.svg _images/icon-image.svg _images/icon-fb2.svg _images/icon-txt.svg _images/icon-docx.svg _images/icon-pptx.svg _images/icon-xlsx.svg _images/icon-hangul.svg
Feature PyMuPDF pikepdf PyPDF2 pdfrw pdfplumber / pdfminer
Supports Multiple Document Formats PDF XPS EPUB MOBI FB2 CBZ SVG TXT Image
DOCX XLSX PPTX HWPX See note
PDF PDF PDF PDF
Implementation Python and C Python and C++ Python Python Python
Render Document Pages All document types No rendering No rendering No rendering No rendering
Write Text to PDF Page
See: Page.insert_htmlbox
or:
Page.insert_textbox
or:
TextWriter
Supports CJK characters
Extract Text All document types PDF only PDF only
Extract Text as Markdown (.md) All document types
Extract Tables All document types PDF only
Extract Vector Graphics All document types Limited
Draw Vector Graphics (PDF)
Based on Existing, Mature Library MuPDF QPDF
Automatic Repair of Damaged PDFs
Encrypted PDFs Limited Limited
Linerarized PDFs
Incremental Updates
Integrates with Jupyter and IPython Notebooks
Joining / Merging PDF with other Document Types All document types PDF only PDF only PDF only PDF only
OCR API for Seamless Integration with Tesseract All document types
Integrated Checkpoint / Restart Feature (PDF)
PDF Optional Content
PDF Embedded Files Limited Limited
PDF Redactions
PDF Annotations Full Limited
PDF Form Fields Create, read, update Limited, no creation
PDF Page Labels Read-only
Support Font Sub-Setting